Language Change Quantification Using Time-separated Parallel Translations

نویسندگان

Kemal Altintas

Fazli Can

Jon M. Patton

چکیده

We introduce a systematic approach to language change quantification by studying unconsciously used language features in time-separated parallel translations. For this purpose, we use objective style markers such as vocabulary richness and lengths of words, word stems and suffixes, and employ statistical methods to measure their changes over time. In this study, we focus on the change in Turkish in the second half of the twentieth century. To obtain word stems, we first introduce various stemming techniques and show that they are highly effective. Our statistical analyses show that over time, for both text and lexicon, the length of Turkish words has become significantly longer, and word stems have become significantly shorter. We also show that suffix lengths have become significantly longer for types and the vocabulary richness based on word stems has shrunk significantly. These observations indicate that in contemporary Turkish one would use more suffixes to compensate for the fewer stems to preserve the expressive power of the language at the same level. Our approach can be adapted for quantifying the change in other languages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

1 Alter 2 Loosen 3 Change Sequence 1 Alter 2 Loosen 3 Change Sequence Means 2 Loosen 3 Change Means

We present discourse annotation work aimed at constructing a parallel corpus of Rhetorical Structure trees for a collection of Japanese texts and their corresponding English translations. We discuss implications of our empirical ndings for the task of text planning in the context of implementing multilingual natural language generation systems.

متن کامل

Translating Collocation using Monolingual and Parallel Corpus

In this paper, we propose a method for translating a given verb-noun collocation based on a parallel corpus and an additional monolingual corpus. Our approach involves two models to generate collocation translations. The combination translation model generates combined translations of the collocate and the base word, and filters translations by a target language model from a monolingual corpus,...

متن کامل

Portable Parallel Translation Machine for Multi-Dictionary Systems

This paper implements an open source multi-dictionary parallel-translation machine using the Python programming language. The implementation parallelizes the translations of English words into three different languages (German, Ibibio and French). The research model has adaptability for n-languages, which could be implemented by adding n-process threads to the current design and building ndicti...

متن کامل

Automatic Extraction of Medical Term Variants from Mutilingual Parallel Translations

The extraction of terms and their variants is an important issue in various applications of natural language processing (NLP), such as question answering and information retrieval. In this chapter we discuss a method to automatically extract medical terms and their variants from a multilingual corpus of parallel translations. As a first step terms are extracted using a pattern-based approach. I...

متن کامل

Creating Sentence-Aligned Parallel Text Corpora from a Large Archive of Potential Parallel Text using BITS and Champollion

Parallel text is one of the most valuable resources for development of statistical machine translation systems and other NLP applications. The Linguistic Data Consortium (LDC) has supported research on statistical machine translations and other NLP applications by creating and distributing a large amount of parallel text resources for the research communities. However, manual translations are v...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 22 شماره

صفحات -

تاریخ انتشار 2007

Language Change Quantification Using Time-separated Parallel Translations

نویسندگان

چکیده

منابع مشابه

1 Alter 2 Loosen 3 Change Sequence 1 Alter 2 Loosen 3 Change Sequence Means 2 Loosen 3 Change Means

Translating Collocation using Monolingual and Parallel Corpus

Portable Parallel Translation Machine for Multi-Dictionary Systems

Automatic Extraction of Medical Term Variants from Mutilingual Parallel Translations

Creating Sentence-Aligned Parallel Text Corpora from a Large Archive of Potential Parallel Text using BITS and Champollion

عنوان ژورنال:

اشتراک گذاری